Fax: An Alternative to SGML
نویسندگان
چکیده
We have argued elsewhere (Church and Mercer, 1993) that text is more available than ever before, and that the availability of massive quantities of data has been responsible for much of the recent interest in text analysis. Ideally, we would hope that this data would be distributed in a convenient format such as SGML (Goldfarb, 1990), but in practice, we usually have to work with the data in whatever format it happens to be in, since we usually aren’t in much of a position to tell the data providers how to do their business. Recently, we have been working with a collection of 15,000 AT&T internal documents (500,000 pages or 100 million words). Unfortunately, this data is stored in a particularly inconvenient format: fax.
منابع مشابه
Another Look at L A TEX to SGML Conversion
Publishers are starting to use SGML as their permanent form of storage for documents. How can LTEX files be converted to an SGML instance? This paper discusses possible strategies, and describes an implementation by Elsevier Science of a system based on conversion in TEX itself, and extraction of SGML code from the dvi file.
متن کاملSGML and XML as interchange formats for HL7 messages
OBJECTIVE To report on the use of SGML and XML (a proper subset of SGML) as transfer syntaxes for HL7 Version 2.3 and Version 3.0 messages. METHODS The methodology has focused largely on two questions: Can it be done? How best to do it? The first question is addressed by attempting to build an SGML/XML representation of HL7 messages. The second question requires a consideration of several met...
متن کاملOn the Interchangeability of SGML and ODA
SGML and ODA are international standards for the markup and interchange of electronic documents. These standards are incompatible, in the sense that in general a document encoded using SGML cannot be used directly in an ODA-based system, and vice versa. We first describe these two standards, and suggest criteria under which a bridge between the two standards could be evaluated. We then evaluate...
متن کاملSGML - Lite { An SGML - based Programming Environment
Literate Programming is a documentation method that attempts to maintain consistency among the various design and program documents of a software system. Unfortunately the majority of the literate programming tools do not have appropriate user interfaces and require the users to learn complicated and cryptic tagging languages. SGML is a metalanguage used to specify markup or tagging languages t...
متن کاملDemand More from Your Sgml Database! Bringing Sql under the Sgml Limelight
Have you ever been frustrated by how inadequate SGML databases are in terms of searching or querying your documents? With the current state of the art, you will easily be able to search for a word, phrase, or keywords in the whole document. Some systems let you perform approximate searches or regular expression searches. Even fewer systems let you search for keywords or phrases in certain SGML ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994